IRC log of #schooltool for Tuesday, 2013-12-03

th1ahi replaceafill.22:40
replaceafillhey th1a22:40
th1aI'm talking to our friend Telly.22:40
replaceafillhow's she doing?22:40
th1aShe says "Had a meeting this morning22:41
th1aAnd my project is at risk of dying if I don't establish this database"22:41
th1aI think we need some kind of a form optimized for copy/pasting from her pdf's.22:42
th1aIt just needs to do some auto-numbering and things to cut steps out.22:43
replaceafillwhy don't we extract data from the pdfs?22:44
th1aTheir internal structure is just a mess.22:46
th1aAs I recall, the document structure and the text are just jumbled together.22:46
replaceafilli'd like to see one, do you have any?22:48
replaceafillor are they in the web?22:48
th1aDid you look at it before?22:48
replaceafilli don't think so22:49
replaceafilli've been "on pause" for the whole Telly story :)22:49
replaceafillwell, all i can identify are skillsets and skills maybe22:52
replaceafillunder element of competency/performance criteria tables?22:52
th1aThere's some higher structure but that's not the scary part.22:52
th1aFor the skills and skillsets if you could just copy/paste without having to click again, enter the number, click...22:55
replaceafillah, i see what you mean22:56
th1aSo she can pay for some coding if we can make it easier for someone there to just grind through.22:56
replaceafilli could play with the pdf tonight22:58
replaceafillif you think we should explore that route22:58
replaceafillor maybe you're already set on the copy/paste approach22:59
th1aWell, you can look at it.22:59
th1aI think you'll realize pretty quickly it's a mess, but maybe not.22:59
replaceafillin my head those tables become skillsets/skills, right?22:59
replaceafillthe rest of the hierarchy, not sure about it23:00
th1aYeah, but they aren't really tables in the pdf.23:00
replaceafillwe should ignore the whole Range Statement part, right?23:01
replaceafilland Evidence Guide23:01
th1aI'm ignoring it for now.23:02
replaceafillthis looks very similar to the salvadorean standards list23:02
th1aThey got it from Trinidad.  It may be used all over Latin America.23:03
th1aWe just need a non-pdf version.23:03
replaceafillextracting pdf text, splitting from PERFORMANCE CRITERIA to RANGE STATEMENT23:05
replaceafillseems to get the list23:05
th1aWell, used throughout the carribean.23:05
th1aSplitting using what?23:05
replaceafillthey're just text now23:05
th1aI tried several pdf parsing things...23:06
replaceafillpdftotext + python?23:06
th1aI don't even remember.  Probably.23:06
th1aIs that what you're using?23:07
th1aPfft.  I guess that's why I'm the project manager.23:07
replaceafilli mean, it's not perfect, but it works23:08
replaceafillleaves some trash around23:08
replaceafillbut cleaning it is definitely faster than copy/paste ;)23:08
replaceafillok, i'll finish the levels work and will write a decent script later23:09
replaceafillbut i think it's doable23:10
replaceafillat least for *this* pdf :D23:10
th1aYeah, don't start now.23:10
replaceafillbrb, rebooting...23:13
*** replaceafill has quit IRC23:13
*** replaceafill has joined #schooltool23:15
th1aOK, so fyi, that document is a single course, and the table at the beginning lays out the included units.23:27
replaceafillhave you thought of a doc structure for this?23:27
replaceafillUnit -> Skill Set -> Skill23:28
replaceafillUnits being "prepare for work, etc"23:28
th1aQuoting me:23:28
th1aSo from your point of view it is Course > Unit > Element of Competency > Performance Criteria23:28
th1aAnd then we need the official ID for the course and unit.23:28
th1aTelly: Correct.23:28
replaceafilloh, we can change names for SkillSets and Skills?23:29
replaceafillth1a, do you have the index page for the rest of the courses?23:33
th1aI asked Telly for more links.23:42
replaceafillah ok23:42
th1aThere are duplicates.23:48
th1aDuplicate units.23:48
th1aGood thing our model doesn't require a hierarchy.23:48
th1aWell, I'll ask if we need to duplicate them a la VA.23:49
replaceafillthat would be easier i think23:49
th1aThat is the sane way to do it.  ;-)23:50
th1aYeah, two courses point to the same unit.23:51
replaceafillsections skills will prevent that from being effective i think23:52
th1aIt will prevent the scores from showing up.23:52
replaceafillwe need the history view ;)23:52
th1aSome different units may have the same title but different id's so we work off ID's.23:53
replaceafillgot it23:54
replaceafilloh! code units have meanings23:55
replaceafillindustry, sector, sub-sector, even version control!23:55
th1aI don't think we need that hieararchy in the system though.23:56
replaceafilli just thought it's cool :P23:56
th1aWell, it helps!23:57

